Goto

Collaborating Authors

 thoughtful feedback and comment


We would like to thank the reviewers for your thoughtful feedback and comments which would undoubtedly make the

Neural Information Processing Systems

We will update our paper to reflect your comments, fix typos and include missing references. We will update the paper to make this more overt. Eq. 4 is therefore chosen Both Eq. 3 and 4 are motivated by the policy improvement theorem. Whereas Eq. 3 seeks to improve the policy by choosing a better action to copy, Eq. 4 does this in a soft manner. R2 - reproducibility: We have open-sourced the code for CRR on Github and the link will be made available.